Modeling Cross-morpheme Pro for Korean Large Vocabulary Cont
نویسنده
چکیده
In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished pronunciation variation rules according to the locations such as within a morpheme, across a morpheme boundary in a compound noun, across a morpheme boundary in an eojeol, and across an eojeol boundary. In 33K-morpheme Korean CSR experiment, an absolute improvement of 1.16% in WER from the baseline performance of 23.17% WER is achieved by modeling crossmorpheme pronunciation variations with a context-dependent multiple pronunciation lexicon.
منابع مشابه
Pronunciation lexicon modeling and design for Korean large vocabulary continuous speech recognition
In this paper, we describe a pronunciation lexicon model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. For modeling of cross-morpheme pronunciation variations, we usually used a context-dependent multiple pronunciatio...
متن کاملKorean large vocabulary continuous speech recognition with morpheme-based recognition units
In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the out-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morphe...
متن کاملKorean large vocabulary continuous speech recognition using pseudomorpheme units
This paper presents a Korean large vocabulary continuous speech recognition system based on pseudomorpheme units. In Korean, an eojeol (word phrase) is a unit for spacing and a morpheme is the smallest unit with semantic meaning. If the eojeol is used as the dictionary and language modeling unit, the number of the unit becomes enormous. Instead we propose to use modified morpheme or pseudomorph...
متن کاملLarge vocabulary continuous speech recognition based on cross-morpheme phonetic information
In this paper, we present a novel method to regulate lexical connections among morpheme-based pronunciation lexicons for Korean large vocabulary continuous speech recognition (LVCSR) systems. A pronunciation dictionary plays an important role in subword-based LVCSR in that pronunciation variations such as coarticulation will deteriorate the performance of an LVCSR system if it is not well accou...
متن کاملUnlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS
This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...
متن کامل